Fix potential deadlock in TimeSource::destroy_clock_sub (#2962)#3167
Open
PavelGuzenfeld wants to merge 2 commits into
Open
Fix potential deadlock in TimeSource::destroy_clock_sub (#2962)#3167PavelGuzenfeld wants to merge 2 commits into
PavelGuzenfeld wants to merge 2 commits into
Conversation
destroy_clock_sub() held clock_sub_lock_ while calling clock_executor_thread_.join(). If the executor thread's shutdown path contends on clock_sub_lock_, this produces a deadlock. Fix: move the thread into a local variable under the lock, release the lock, then join outside the critical section. The thread is not dangling — it is unconditionally joined before the function returns. Key improvement over the naive move approach: the subscription is kept alive until after join completes, ensuring the executor thread can finish any in-flight callback without accessing a destroyed subscription. Cleanup order: cancel → (release lock) → join → remove_callback_group → (reacquire lock) → reset subscription. Fixes ros2#2962 Signed-off-by: Pavel Guzenfeld <pavelguzenfeld@gmail.com> Generated-by: Claude Opus 4.6 (Anthropic)
Tests reproduce the conditions from ros2#2962: - Rapid construct/destroy of nodes with use_sim_time=true (50 cycles) - Concurrent construct/destroy from 4 threads - Toggle use_sim_time while spinning - Verify no lingering thread after destruction If the deadlock is present, these tests will hang and be caught by the 60-second timeout. Signed-off-by: Pavel Guzenfeld <pavelguzenfeld@gmail.com> Generated-by: Claude Opus 4.6 (Anthropic)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2962
destroy_clock_sub()heldclock_sub_lock_acrossclock_executor_thread_.join(). If the executor thread's shutdown path needsclock_sub_lock_, the two deadlock: the main thread waits on the join while the executor thread waits on the lock.This moves the thread, executor, and callback group into locals under the lock, releases the lock, then joins outside the critical section. The subscription is kept alive until after the join, so the executor thread can finish an in-flight callback without touching a destroyed subscription, and the thread is always joined before the function returns.
Order: cancel → release lock → join → remove_callback_group → reacquire lock → reset subscription.
Adds
test_time_source_deadlock.cpp: rapid construct/destroy withuse_sim_time=true, concurrent construct/destroy across threads, togglinguse_sim_timewhile spinning, and a check for no lingering thread. If the deadlock is present these hang and trip the 60 s timeout.Some of this change was produced with Claude Opus 4.6 (Anthropic).